home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
kermit.columbia.edu
/
kermit.columbia.edu.tar
/
kermit.columbia.edu
/
newsgroups
/
misc.20000824-20010305
/
000361_news@columbia.edu _Sat Feb 24 12:16:30 2001.msg
< prev
next >
Wrap
Internet Message Format
|
2001-03-05
|
4KB
Return-Path: <news@columbia.edu>
Received: from watsun.cc.columbia.edu (watsun.cc.columbia.edu [128.59.39.2])
by monire.cc.columbia.edu (8.9.3/8.9.3) with ESMTP id MAA23105
for <kermit.misc@cpunix.cc.columbia.edu>; Sat, 24 Feb 2001 12:16:30 -0500 (EST)
Received: from newsmaster.cc.columbia.edu (newsmaster.cc.columbia.edu [128.59.59.30])
by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id MAA07804
for <kermit.misc@watsun.cc.columbia.edu>; Sat, 24 Feb 2001 12:16:29 -0500 (EST)
Received: (from news@localhost)
by newsmaster.cc.columbia.edu (8.9.3/8.9.3) id LAA13738
for kermit.misc@watsun.cc.columbia.edu; Sat, 24 Feb 2001 11:52:41 -0500 (EST)
X-Authentication-Warning: newsmaster.cc.columbia.edu: news set sender to <news> using -f
From: fdc@columbia.edu (Frank da Cruz)
Subject: Re: code pages/character set
Date: 24 Feb 2001 16:52:40 GMT
Organization: Columbia University
Message-ID: <978oso$dd7$1@newsmaster.cc.columbia.edu>
To: kermit.misc@columbia.edu
In article <9ZOl6.8778$Sv5.88347@wagner.videotron.net>,
Patrick St-Jacques <pstjac@videotron.ca> wrote:
: Hi everybody, I have a problem that needs immediate assistance. I work for
: the Canadian custom agency, administrating the e-commerce platform, we
: receive EDI transmission for all electronic forms coming in to Canada then
: send this data to be processed by the mainframe.
: Now my problem is this. because of the way the transaction are sent when it
: gets to our Solaris box ,we have no clue what code page or character set we
: receive is, our system expects code page 819 (ISO standard) but some of our
: clients can sent their data using 850 (dos French) or 437 ( dos US) or even
: special code pages.
:
: Now my question is this : Is their a utility in Solaris 7 that can determine
: what code page a file is using.
:
No. It is possible to tell if a file is 7-bit or 8-bit. If it has 8-bit
bytes, anything that you can tell about it is a matter of probablity and
statistics, not certaintly. It can be determined with a fair amount of
reliability whether it is text or binary. If it is text, it can be determined
whether it is UTF-8, UCS-2 (or -16), or some 8-bit character set. It is
virtually impossible without some form of natural language recognition to
tell one 8-bit character ("code page") set from another by inspection.
: Just a quick run down of our we get the data: we use x-400 transport and the
: data is encrypted so we cannot convert the code page when we get it since
: its encrypted, we want to check and convert(if necessary) after decryption.
: We need to know this because we use IBM MQseries to transport our outside
: world data to our applications (which are on the mainframe) I know MQseries
: knows how to convert but it has to know what the original code page is and
: because it is running on Solaris is assumes 819.
:
Your setup is wrong from the beginning. A common intermediate representation
should be used for text on the wire. This is a fundamental principal of
communication protocols. It is the responsibility of the sender
to convert its local text format and character set to the common one for
transmission.
Kermit protocol and software has been doing this since the 1980s:
http://www.columbia.edu/kermit/
In your example, the client would use Kermit to send its code page 437 or 850
data from a PC (via dialup or TCP/IP) and convert it to ISO 8859-1 as part of
the transfer. Your Sun could use Kermit to send the ISO 8859-1 data to the
IBM mainframe, whose Kermit program would convert it to the required IBM
Country Extended Code Page (CECP), or you could continue to use the IBM MQ
method for that stage.
Many ODI applications use Kermit as the transport, some of them for
exactly this reason, as well as because it is independent of the platform
and the communication method. Now you can also use the Kermit FTP client:
http://www.columbia.edu/kermit/ftpclient.html
in the same way. It's first and, to my knowledge, only FTP client that
converts character sets. It also allows for secure, encrypted transfers.
- Frank